最新的多视图多媒体应用程序在高分辨率(HR)视觉体验与存储或带宽约束之间挣扎。因此,本文提出了一个多视图图像超分辨率(MVISR)任务。它旨在增加从同一场景捕获的多视图图像的分辨率。一种解决方案是将图像或视频超分辨率(SR)方法应用于低分辨率(LR)输入视图结果。但是,这些方法无法处理视图之间的大角度转换,并利用所有多视图图像中的信息。为了解决这些问题,我们提出了MVSRNET,该MVSRNET使用几何信息从所有LR多视图中提取尖锐的细节,以支持LR输入视图的SR。具体而言,MVSRNET中提出的几何感知参考合成模块使用几何信息和所有多视图LR图像来合成像素对齐的HR参考图像。然后,提出的动态高频搜索网络完全利用了SR参考图像中的高频纹理细节。关于几个基准测试的广泛实验表明,我们的方法在最新方法上有了显着改善。
translated by 谷歌翻译
在立体声设置下,可以通过利用第二视图提供的其他信息来进一步改善图像JPEG伪像删除的性能。但是,将此信息纳入立体声图像jpeg trifacts删除是一个巨大的挑战,因为现有的压缩工件使像素级视图对齐变得困难。在本文中,我们提出了一个新颖的视差变压器网络(PTNET),以整合来自立体图像对的立体图像对jpeg jpeg trifacts删除的信息。具体而言,提出了精心设计的对称性双向视差变压器模块,以匹配具有不同视图之间相似纹理的特征,而不是像素级视图对齐。由于遮挡和边界的问题,提出了一个基于置信的跨视图融合模块,以实现两种视图的更好的特征融合,其中跨视图特征通过置信图加权。尤其是,我们为跨视图的互动采用粗到最新的设计,从而提高性能。全面的实验结果表明,与其他测试最新方法相比,我们的PTNET可以有效地消除压缩伪像并获得更高的性能。
translated by 谷歌翻译
基于深度学习的立体图像超分辨率(StereOSR)的最新研究促进了Stereosr的发展。但是,现有的立体声模型主要集中于改善定量评估指标,并忽略了超级分辨立体图像的视觉质量。为了提高感知性能,本文提出了第一个面向感知的立体图像超分辨率方法,通过利用反馈,这是对立体声结果的感知质量的评估提供的。为了为StereOSR模型提供准确的指导,我们开发了第一个特殊的立体图像超分辨率质量评估(StereOSRQA)模型,并进一步构建了StereOSRQA数据库。广泛的实验表明,我们的Stereosr方法显着提高了感知质量,并提高了立体声图像的可靠性以进行差异估计。
translated by 谷歌翻译
深度神经网络极大地促进了单图超分辨率(SISR)的性能。传统方法仍然仅基于图像模态的输入来恢复单个高分辨率(HR)解决方案。但是,图像级信息不足以预测大型展望因素面临的足够细节和光真逼真的视觉质量(x8,x16)。在本文中,我们提出了一种新的视角,将SISR视为语义图像详细信息增强问题,以产生忠于地面真理的语义合理的HR图像。为了提高重建图像的语义精度和视觉质量,我们通过提出文本指导的超分辨率(TGSR)框架来探索SISR中的多模式融合学习,该框架可以从文本和图像模态中有效地利用信息。与现有方法不同,提出的TGSR可以生成通过粗到精细过程匹配文本描述的HR图像详细信息。广泛的实验和消融研究证明了TGSR的效果,该效果利用文本参考来恢复逼真的图像。
translated by 谷歌翻译
Although many studies have successfully applied transfer learning to medical image segmentation, very few of them have investigated the selection strategy when multiple source tasks are available for transfer. In this paper, we propose a prior knowledge guided and transferability based framework to select the best source tasks among a collection of brain image segmentation tasks, to improve the transfer learning performance on the given target task. The framework consists of modality analysis, RoI (region of interest) analysis, and transferability estimation, such that the source task selection can be refined step by step. Specifically, we adapt the state-of-the-art analytical transferability estimation metrics to medical image segmentation tasks and further show that their performance can be significantly boosted by filtering candidate source tasks based on modality and RoI characteristics. Our experiments on brain matter, brain tumor, and white matter hyperintensities segmentation datasets reveal that transferring from different tasks under the same modality is often more successful than transferring from the same task under different modalities. Furthermore, within the same modality, transferring from the source task that has stronger RoI shape similarity with the target task can significantly improve the final transfer performance. And such similarity can be captured using the Structural Similarity index in the label space.
translated by 谷歌翻译
Modern deep neural networks have achieved superhuman performance in tasks from image classification to game play. Surprisingly, these various complex systems with massive amounts of parameters exhibit the same remarkable structural properties in their last-layer features and classifiers across canonical datasets. This phenomenon is known as "Neural Collapse," and it was discovered empirically by Papyan et al. \cite{Papyan20}. Recent papers have theoretically shown the global solutions to the training network problem under a simplified "unconstrained feature model" exhibiting this phenomenon. We take a step further and prove the Neural Collapse occurrence for deep linear network for the popular mean squared error (MSE) and cross entropy (CE) loss. Furthermore, we extend our research to imbalanced data for MSE loss and present the first geometric analysis for Neural Collapse under this setting.
translated by 谷歌翻译
In this paper we derive a PAC-Bayesian-Like error bound for a class of stochastic dynamical systems with inputs, namely, for linear time-invariant stochastic state-space models (stochastic LTI systems for short). This class of systems is widely used in control engineering and econometrics, in particular, they represent a special case of recurrent neural networks. In this paper we 1) formalize the learning problem for stochastic LTI systems with inputs, 2) derive a PAC-Bayesian-Like error bound for such systems, 3) discuss various consequences of this error bound.
translated by 谷歌翻译
Denoising Diffusion Probabilistic Models (DDPMs) are emerging in text-to-speech (TTS) synthesis because of their strong capability of generating high-fidelity samples. However, their iterative refinement process in high-dimensional data space results in slow inference speed, which restricts their application in real-time systems. Previous works have explored speeding up by minimizing the number of inference steps but at the cost of sample quality. In this work, to improve the inference speed for DDPM-based TTS model while achieving high sample quality, we propose ResGrad, a lightweight diffusion model which learns to refine the output spectrogram of an existing TTS model (e.g., FastSpeech 2) by predicting the residual between the model output and the corresponding ground-truth speech. ResGrad has several advantages: 1) Compare with other acceleration methods for DDPM which need to synthesize speech from scratch, ResGrad reduces the complexity of task by changing the generation target from ground-truth mel-spectrogram to the residual, resulting into a more lightweight model and thus a smaller real-time factor. 2) ResGrad is employed in the inference process of the existing TTS model in a plug-and-play way, without re-training this model. We verify ResGrad on the single-speaker dataset LJSpeech and two more challenging datasets with multiple speakers (LibriTTS) and high sampling rate (VCTK). Experimental results show that in comparison with other speed-up methods of DDPMs: 1) ResGrad achieves better sample quality with the same inference speed measured by real-time factor; 2) with similar speech quality, ResGrad synthesizes speech faster than baseline methods by more than 10 times. Audio samples are available at https://resgrad1.github.io/.
translated by 谷歌翻译
Deep learning has been widely used for protein engineering. However, it is limited by the lack of sufficient experimental data to train an accurate model for predicting the functional fitness of high-order mutants. Here, we develop SESNet, a supervised deep-learning model to predict the fitness for protein mutants by leveraging both sequence and structure information, and exploiting attention mechanism. Our model integrates local evolutionary context from homologous sequences, the global evolutionary context encoding rich semantic from the universal protein sequence space and the structure information accounting for the microenvironment around each residue in a protein. We show that SESNet outperforms state-of-the-art models for predicting the sequence-function relationship on 26 deep mutational scanning datasets. More importantly, we propose a data augmentation strategy by leveraging the data from unsupervised models to pre-train our model. After that, our model can achieve strikingly high accuracy in prediction of the fitness of protein mutants, especially for the higher order variants (> 4 mutation sites), when finetuned by using only a small number of experimental mutation data (<50). The strategy proposed is of great practical value as the required experimental effort, i.e., producing a few tens of experimental mutation data on a given protein, is generally affordable by an ordinary biochemical group and can be applied on almost any protein.
translated by 谷歌翻译
Deep neural networks (DNNs) are found to be vulnerable to adversarial attacks, and various methods have been proposed for the defense. Among these methods, adversarial training has been drawing increasing attention because of its simplicity and effectiveness. However, the performance of the adversarial training is greatly limited by the architectures of target DNNs, which often makes the resulting DNNs with poor accuracy and unsatisfactory robustness. To address this problem, we propose DSARA to automatically search for the neural architectures that are accurate and robust after adversarial training. In particular, we design a novel cell-based search space specially for adversarial training, which improves the accuracy and the robustness upper bound of the searched architectures by carefully designing the placement of the cells and the proportional relationship of the filter numbers. Then we propose a two-stage search strategy to search for both accurate and robust neural architectures. At the first stage, the architecture parameters are optimized to minimize the adversarial loss, which makes full use of the effectiveness of the adversarial training in enhancing the robustness. At the second stage, the architecture parameters are optimized to minimize both the natural loss and the adversarial loss utilizing the proposed multi-objective adversarial training method, so that the searched neural architectures are both accurate and robust. We evaluate the proposed algorithm under natural data and various adversarial attacks, which reveals the superiority of the proposed method in terms of both accurate and robust architectures. We also conclude that accurate and robust neural architectures tend to deploy very different structures near the input and the output, which has great practical significance on both hand-crafting and automatically designing of accurate and robust neural architectures.
translated by 谷歌翻译